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Abstract 

We introduce a data-based approach to estimating key quantities which arise in the study of 
nonlinear control systems and random nonlinear dynamical systems. Our approach hinges on the 
observation that much of the existing linear theory may be readily extended to nonlinear systems 
- with a reasonable expectation of success - once the nonlinear system has been mapped into a 
high or infinite dimensional feature space. In particular, we develop computable, non-parametric 
estimators approximating controllability and observability energy functions for nonlinear systems, 
and study the ellipsoids they induce. In all cases the relevant quantities are estimated from simulated 
or observed data. It is then shown that the controllability energy estimator provides a key means for 
approximating the invariant measure of an ergodic, stochastically forced nonlinear system. 

1 Introduction 

Personal computing has developed to the point where in many cases it ought to be easier to simulate a 
dynamical system and analyze the empirical data, rather than attempt to study the system analytically. 
Indeed, for large classes of nonlinear systems, numerical analysis may be the only viable option. Yet the 
mathematical theory necessary to analyze dynamical systems on the basis of observed data is still largely 
underdeveloped. In previous work the authors proposed a linear, data-based approach for model reduction 
of nonlinear control systems [3j. The approach is based on lifting simulated trajectories of the system 
into a high or infinite dimensional feature (Hilbert) space where the evolution of the original system may 
be reasonably modeled as linear. One may then implicitly carry out linear balancing, truncation, and 
model reduction in the feature space while retaining nonlinearity in the original statespace. 

In this paper, we continue under this setting and explore data-based definitions of key concepts for 
nonlinear control and random dynamical systems. We propose an empirical approach for estimation of 
the controllability and observability energies for stable nonlinear control systems, as well as invariant 
measures and their supports for ergodic nonlinear stochastic differential equations. Our methodology ap- 
plies the relevant linear theory in a feature space where it is assumed that the original nonlinear system 
behaves approximately linearly. In this case we leverage the well-known connection between the controlla- 
bility gramian of a linear control system and the invariant measure associated to the corresponding linear 
stochastic differential equation. This relationship was previously identified as useful for finding the con- 
trollability energy for certain nonlinear control systems given the invariant measure of the corresponding 
randomly forced dynamical system (23]. The approach in [23], however, requires solving a Fokker-Planck 
equation and so applies to only a narrow class of systems. Our approach takes the reverse direction in a 
data-driven setting: given an empirical estimate of the controllability energy function, one can obtain an 
estimate of the invariant measure. In particular, we will propose a consistent, data-based estimator for 

*An abbreviated version of this report will appear in Proc. American Control Conference (ACC), 2012. 
TJ. Bouvrie is with the Department of Mathematics, Duke University, Durham, NC 27708, USA jvb@math.duke.edu 
tB. Hamzi is with the Department of Mathematics, Imperial College London, London, SW7 2AZ, UK 
b . hamziOimperial .ac.uk 



the controllability energy function of a nonlinear control system, and show how it can be used to estimate 
the invariant measure and its support for the corresponding stochastic differential equation (SDE). 

The essential point of this paper is to illustrate that it is possible to find data-based estimates of 
linear objects in order to understand nonlinear control and random dynamical systems, without having 
to solve a Hamilton- Jacobi-Bellman or Lyapunov equation in the case of nonlinear control systems, or a 
Fokker-Planck equation in the case of nonlinear SDEs. The approach proposed here also highlights the 
close interaction between control and random dynamical systems and demonstrates how control theoretic 
objects can be useful for studying random dynamical systems. Our contribution should be seen as a 
step towards developing a mathematical, data-driven theory for dynamical systems which can be used 
to analyze and predict random dynamical systems, as well as offer data-driven control strategies for 
nonlinear systems on the basis of observed data rather than a pre-specified model. 



2 Linear Systems as a Paradigm for Working in RKHS: 
Background 

In this section we give a brief overview of some important background concepts in linear control, ran- 
dom dynamical systems and reproducing kernel Hilbert spaces (RKHS). We will make use of the linear 
theory that follows after mapping the state variable of a nonlinear system into a suitable RKHS, thereby 
harnessing RKHS theory as a framework for extending linear tools to nonlinear systems. The following 
background material closely follows [TU [22 [5] . 



2.1 Linear Control Systems 

Consider a linear control system 

x = Ax + Bu 

y = Cx ' ( } 

where x £ E™, u £ R q , y £ R p , (A,B) is controllable, (A, C) is observable and A is Hurwitz. We 
define the controllability and the observability Gramians as, respectively, W c = J °° e At BB T e A * dt, 

W = f °° e A t C T Ce At dt. These two matrices can be viewed as a measure of the controllability and 
the observability of the system [22]. For instance, consider the past energy [27], L c (xq), defined as the 
minimal energy required to reach xo from in infinite time 

L c (x )= inf \ [ \\u(t)\\ 2 dt, (2) 

uGL2( — oo,0), Z J_ co 
%(— oo)= 0,x(0)— Xq 

and the future energy [27], L Q (xo), defined as the output energy generated by releasing the system from 
its initial state x(to) = xo, and zero input u(t) — for t > 0, i.e. 

1 r°° 

L (xo) = - J o \\y(t)fdt, (3) 

for x(to) = xq and u(t) — 0, t > 0. 

In the linear case, it can be shown that 

L c (x ) = I^W-^o, (4) 
L {xq) = \x T W o x . (5) 

Moreover, W c and W satisfy the following Lyapunov equations |12| : 

AW C + W C A T = -BB T , 

A T Wo + W A = -C T C. 1 ' 

These energies are directly related to the controllability and observability operators. 
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Definition 2.1. [T^] Given a matrix pair (A,B), the controllability operator fy c is denned as 

* c : L 2 (-oo,0) C" 

u n- /_ e~ Ar Bu(r)dT 

The significance of this operator is made evident via the following optimal control problem: Given 
the linear system x(t) = Ax(t) + Bu(t) defined for t G (—00, 0) with x(— 00) = 0, and for x(0) G C ra with 
unit norm, what is the minimum energy input u which drives the state x(t) to x(0) = x$ at time zero? 
That is, what is the u G Li{— 00, 0] solving ^ c u = xq with smallest norm ||w||2? If (A, B) is controllable, 
then ^ c^* c ='■ W c is nonsingular, and the answer to the preceding question is 

u op t := KW~ l x . (7) 

The input energy is given by 

= (W^Xo^cKW^Xo) 

= x%W- x x Q . 

Moreover, the reachable set through u pt, i- e - the final states xq = f c u that can be reached given an 
input u G L2(—oo, 0] of unit norm, {"J^ti '■ u G 00, 0] and \\u\\2 < 1} may be defined as 

K := {W?x c : x c G C" and ||se c || < 1}. (8) 
Similarly, for the autonomous system 

X = Ax, X(0) = X G C'\ 

y = Cx 

where A is Hurwitz, the observability operator is defined as follows. 

Definition 2.2. [T^] Given a matrix pair (A,C), where A is Hurwitz, the observability operator ^ Q is 
defined as 

* :C" -> L 2 (0,oo) 

f Ce At a;o, for t > 
T ° I 0, otherwise 

The corresponding observability ellipsoid is given by 

£ := {WSxq : x G C" and ||x || = 1}. 
The energy of the output signal y = ^ xq, for Xq G C" can then be computed as 
\\y\\l = (^ xo^ xq) = (xo^l^oxo) = (x ,W xo) 
where : L 2 [0,oo) -» C n is the adjoint of 

2.2 Linear Stochastic Differential Equations 

In this section, we review the relevant background for stochastically forced differential equations (see 
e.g. [24, 5J for more detail). Here we will consider stochastically excited dynamical control systems affine 
in the input u G K 9 

x = f(x) + G{x)u , (9) 

where G : W 1 — > M. nxq is a smooth matrix- valued function and x G R". We replace the control inputs by 
sample paths of white Gaussian noise processes, giving the corresponding stochastic differential equation 

dX t = f(X t )dt + G(X t )dwi q) (10) 
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with W t iq) a q— dimensional Brownian motion. The solution X t to this SDE is a Markov stochastic 
process with transition probability density p(t, x). The time evolution of the probability density p(t,x) 
is described by the Fokker- Planck (or Forward Kolmogorov) equation 

j,k=l ■> 

The differential operator C on the right-hand side is referred to as the Fokker-Planck operator associated 
to (|10p . The steady-state probability density for (|10p is a solution of the equation 



C Poo = 0. (12) 

In the context of linear Gaussian theory where we are given an n— dimensional system of the form 

dX t = AX t dt + BdW t iq) , (13) 

with A £ W ixn , B € R nX9 , the transition density is Gaussian. It is therefore sufficient to find the mean 
and covariance of the solution X(t) in order to uniquely determine the transition probability density. The 
mean satisfies f t E[X] = AE[X] and thus E[X(t)] = e At E[X(0)]. If A is Hurwitz, limt^ E[X(t)] = 0. 
The covariance satisfies f t E[XX T ] = AE[XX T ] + E[XX T ]A + BB T . This formulation gives the steady- 
state distribution's covariance matrix as Q = lim^oo E[X t Xj] so that we may find Q by solving the 
Lyapunov system AQ+ QA T = —BB T . Thus the solution Q is exactly the controllability gramian 

poo 

Q = W C = / e At BB T e ATt dt 



Jo 

which is positive iff. the pair (A,B) is controllable [5]. Combining the above facts, the steady-state 
probability density is given by 

Poo (x) = Z- 1 e-i* Tw ° lx = Z- Y e- L ^) (14) 

using © and letting Z = v /(27r)"det(VF c ). 

Equation (|14[) suggests the following key observations in the linear setting: 

• Given an approximation L c of L c we obtain an approximation for of the form 

Poo oc e- £ "^ (15) 

• Given an approximation p^ of p^ we obtain an approximation for L c (x) by solving 

L c (x) = -\n[p 00 {x)]+C. (16) 

We note that these approximations have been used in different contexts to study nonlinear control 
and random dynamical systems. For instance, in [B], Equation (fTS"]) was used to find explicit solutions of 
the Fokker-Planck equation for systems where a Lyapunov equation for the unforced system can be found 
and solved. In [23J, Equation ([16]) was used, given an explicit solution to the Fokker-Planck equation, to 
approximate the controllability energy and subsequently applied to the problem of model reduction for 
nonlinear control systems. 

Although the above relationship between p^ and L c holds for only a small class of systems (e.g. 
linear and some Hamiltonian systems), by mapping a nonlinear system into a suitable reproducing kernel 
Hilbert space we may reasonably extend this connection to a broad class of nonlinear systems. We will 
return to this topic in Section [S] after defining kernel Hilbert spaces and introducing gramians in RKHS. 



4 



2.3 Reproducing Kernel Hilbert Spaces 



We give a brief overview of reproducing kernel Hilbert spaces as used in statistical learning theory. 
The discussion here borrows heavily from [5J [2FJ ISO]- Early work developing the theory of RKHS was 
undertaken by N. Aronszajn [T]. 

Definition 2.3. Let Ti be a Hilbert space of functions on a set X. Denote by (f,g) the inner product 
on Ti and let ||/|| = (/, f) 1 ^ 2 be the norm in Ti, for / and g £ Ti. We say that Ti is a reproducing kernel 
Hilbert space (RKHS) if there exists a function A : X x X — > R such that 

i. K x := K(x, ■)£% for all x £ X. 

ii. A spans Ti: Ti = span{A" x | x £ X}. 

iii. A has the reproducing property: V/ £ Ti, f(x) — (f,K x ). 

K will be called a reproducing kernel of Ti. T-Lk will denote the RKHS Ti with reproducing kernel A 
where it is convenient to explicitly note this dependence. 

Definition 2.4 (Mercer kernels). A function A : X x X —> K is called a Mercer kernel if it is continuous, 
symmetric and positive definite. 

The important properties of reproducing kernels are summarized in the following proposition. 

Proposition 2.1. If A' is a reproducing kernel of a Hilbert space Ti, then 

i. K(x, y) is unique. 

ii. Vx,y £ X, K(x,y) = K{y,x) (symmetry). 

iii. j—i OH<%jK(%ii x j) > for on £ R, Xi £ X and q £ N + (positive definiteness). 

iv. (K(x, •), K(y, •)) = K(x, y). 

Common examples of Mercer kernels defined on a compact domain X C R" are K(x, y) — x ■ y 
(Linear), K(x, y) = (1 + x ■ y) d for d £ IN + (Polynomial), and K(x, y) = e — II 33- ^ Ha/ 17 ' , cr > (Gaussian). 

Theorem 2.1. Let if:/fx^->lRbea symmetric and positive definite function. Then there exists a 
Hilbert space of functions H defined on X admitting A as a reproducing Kernel. Conversely, let T-L be a 
Hilbert space of functions / : X — > R satisfying Vx € X,3k x > 0, such that |/(x)| < k^H/H-h, V/ € Ti. 
Then "H has a reproducing kernel A. 

Theorem 2.2. Let K(x,y) be a positive definite kernel on a compact domain or a manifold A. Then 
there exists a Hilbert space T and a function <i> : X — > T such that 

K(x,y) = for x,y £ X. 

<f> is called a feature map, and J 7 a feature spac^E 

Given Theorem l2.21 and property [iv.] in Proposition ^. 1[ note that we can take := A^. := K(x, •) 
in which case J- = Ti - the "feature space" is the RKHS itself, as opposed to an isomorphic space. We will 
make extensive use of this feature map. The fact that Mercer kernels are positive definite and symmetric is 
also key; these properties ensure that kernels induce positive, symmetric matrices and integral operators, 
reminiscent of similar properties enjoyed by gramians and covariance matrices. Finally, in practice one 
typically first chooses a Mercer kernel in order to choose an RKHS: Theorem 12 . 1 1 guarantees the existence 
of a Hilbert space admitting such a function as its reproducing kernel. 

A key observation however, is that working in RKHS allows one to immediately find nonlinear versions 
of algorithms which can be expressed in terms of inner products. Consider an algorithm expressed in 

1 The dimension of the feature space can be infinite, for example in the case of the Gaussian kernel. 
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terms of the inner product (x,x')x with x,x' G X. Now assume that instead of looking at a state x, we 
look at its $ image in W, 

* 1 X ^ n (17) 
a; H> $(x) . v ; 

In the RKHS, the inner product (<&(x), $(x')) is 

($(x),$(x')) = #(x,x') (18) 

by the reproducing property. Hence, a nonlinear variant of the original algorithm may be implemented 
using kernels in place of inner products on X. 



3 Empirical Gramians in RKHS 

In this Section we recall empirical gramians for linear systems |22| . as well as a notion of empirical 
gramians for nonlinear systems in RKHS introduced in |3|- The goal of the construction we describe here 
is to provide meaningful, data-based empirical controllability and observability gramians for nonlinear 
systems. In [3J, observability and controllability gramians were used for balanced model reduction, 
however here we will use these quantities to analyze nonlinear control properties and random dynamical 
systems. We note that a related notion of gramians for nonlinear systems is briefly discussed in |16| . 
however no method for computing or estimating them was given. 



3.1 Empirical Gramians for Linear Systems 

To compute the Gramians for the linear system ([T]), one can attempt to solve the Lyapunov equations 
© directly although this can be computationally prohibitive. For linear systems, the gramians may be 
approximated by way of matrix multiplications implementing primal and adjoint systems (see the method 
of snapshots, e.g. |26|). Alternatively, for any system, linear or nonlinear, one may take the simulation 
based approach introduced by B.C. Moore [52] for reduction of linear systems, and subsequently extended 
to nonlinear systems in [20J . The method proceeds by exciting each coordinate of the input with impulses 
from the zero initial state (xq = 0). The system's responses are sampled, and the sample covariance is 
taken as an approximation to the controllability gramian. Denote the set of canonical orthonormal basis 
vectors in W 1 by {e^ji. Let u l (t) = 8(t)ti be the input signal for the i-th simulation, and let x l {t) be the 
corresponding response of the system. Form the matrix X{t) = [x 1 (i) • ■ ■ x q (t)] € K" x<z , so that X(t) is 
seen as a data matrix with column observations given by the respective responses x l {t). Then the (n x n) 
controllability gramian is given by 

W cM = - / X(t)X(t) T dt. (19) 
Q Jo 

We can approximate this integral by sampling the matrix function X{t) within a finite time interval [0, T] 
assuming for instance the regular partition {ti}f =1 , t.- L — (T/N)i. This leads to the empirical controllability 
gramian 

rp N 

W cM = ^rY] X(U)X(U) T . (20) 
Nq^ 

The observability gramian is estimated by fixing u(t) = 0, setting xo = for i = 1, ...,n, and 
measuring the corresponding system output responses y l (t). Now assemble the output responses into 
a matrix Y(t) = [y x (<) • • • y n (t)] € M px ™. The (n x n) observability gramian W . u n and its empirical 
counterpart W^Hn are respectively given by 

W oXm = - / Y(t) T Y(t)dt (21) 
P Jo 
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and 

T N 

where Y(t) = Y(t) T . The matrix Y(t, t ) £ R nx P can be thought of as a data matrix with column 
observations 

djiU) = (y](U),...,y?(U)) T €R n , (23) 

for j = l,...,p, i = 1, . . . ,N so that dj(ti) corresponds to the response at time U of the single output 
coordinate j to each of the (separate) initial conditions xq = eu, k = 1, . . . , n. 

3.2 Empirical Gramians in RKHS Characterizing Nonlinear Sys- 
tems 

Consider the generic nonlinear system 

x = F(x,u) 



y = h(x), (24) 

with x eW 1 , ueW, y e W, F(0) = and h(0) = 0. Assume that the linearization of (@D around the 
origin is controllable, observable and A = ^| x=0 is asymptotically stable. 



RKHS counterparts to the empirical quantities (|20]) , (|22p defined above for the system (|24j) can be 
defined by considering feature- mapped lifts of the simulated samples in T-Lk- In the following, and 
without loss of generality, we assume the data are centered in feature space, and that the observability 
samples and controllability samples are centered separately. See ([IS], Ch. 14) for a discussion on implicit 
data centering in RKHS with kernels. 

First, observe that the gramians W c , W can be viewed as the sample covariance of a collection of 
N ■ q,N ■ p vectors in R™ scaled by T, respectively. Then applying $ to the samples as in (fT7j) . we obtain 
the corresponding gramians in the RKHS associated to K as bounded linear operators on Hk- 

rp N q 
q i=l 3=1 

rp N p 

P i=l 3=1 

where the samples defined in Section I3.1[ and a <g> b = a (b, ■) denotes the tensor product 

in T~L. From here on we will use the notation W c , W to refer to RKHS versions of the true (integrated) 
gramians, and W c , W to refer to RKHS versions of the empirical gramians. 

Let \I> denote the matrix whose columns are the (scaled) observability samples mapped into feature 
space by $, and let $ be the matrix similarly built from the feature space representation of the control- 
lability samples. Then we may alternatively express the gramians above as W c = <&<I> T and W = SI/M/ 7 , 
and define two other important quantities: 

• The controllability kernel matrix K c G R Ar 9 xJV '? of kernel products 

K c = (26) 

(K c )^ = K{x„, x v ) = (QfaJM**))? ( 27 ) 

for fi, v = 1, . . . , Nq where we have re-indexed the set of vectors \xP (tj)}i,j = {x^}^ to use a single 
linear index. 

• The observability kernel matrix K a £ ^NpxNp^ 

K = (28) 
(K )^ = K(d„, d v ) = {^{du)Mdu))j, (29) 
for ji, v = 1, . . . , Np, where we have again re-indexed the set {dj(U)}ij = {d^}^ for simplicity. 
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Note that K C ,K Q may be highly ill-conditioned. The SVD may be used to show that W c and K c (W 
and K Q ) have the same singular values (up to zeros). 

4 Nonlinear Control Systems in RKHS 

In this section, we introduce empirical versions of the controllability and observability energies (HJ)-© 
for stable nonlinear control systems of the form ©, that can be estimated from observed data. Our 
underlying assumption is that a given nonlinear system may be treated as if it were linear in a suitable 
feature space. That reproducing kernel Hilbert spaces provide rich representations capable of capturing 
strong nonlinearities in the original input (data) space lends validity to this assumption. 

In general little is known about the energy functions in the nonlinear setting. However, Scherpen [27] 
has shown that the energy functions L c (x) and L (x) defined in @ and Q satisfy a Hamilton- Jacobi 
and a Lyapunov equation, respectively. 

Theorem 4.1. [27] Consider the nonlinear control system p4|) with F(x,u) = f(x) + G(x)u. If the 
origin is an asymptotically stable equilibrium of f(x) on a neighborhood W of the origin, then for all 
x G W , L (x) is the unique smooth solution of 

fit i 

^( x )f( x ) + ih T (x)h(x)=0, L o (0)=0 (30) 

under the assumption that (|3H|) has a smooth solution on W. Furthermore for all x € W, L c (x) is the 
unique smooth solution of 

fit i Fit fFt 

-Q^wn*) + 2^i {x)9{x)9T{x) ~d^ {x) = °' LM = (31) 

under the assumption that (|31[) has a smooth solution L c on W and that the origin is an asymptotically 
stable equilibrium of —(/(a;) + g(x)g T (x)^^(x)) on W. 

We would like to avoid solving explicitly the PDEs (f3T)|) - (UTT]) and instead find good estimates of their 
solutions directly from simulated or observed data. 

4.1 Energy Functions 

Following the linear theory developed in Section we would like to define analogous controllability and 
observability energy functions paralleling ((4])- {5]), but adapted to the nonlinear setting. We first treat 
the controllability function. Let /Zoo on the statespace X denote the unknown invariant measure of the 
nonlinear system (|24[) when driven by white Gaussian noise. We will consider here the case where the 
controllability samples {xi]^L x are i.i.d. random draws from /j.^, and X is a compact subset of K". The 
former assumption is implicitly made in much of the empirical balancing literature, and if a system is 
simulated for long time intervals, it should hold approximately in practice. If we take $(x) = K x , the 
infinite-data limit of (|23|) is given by 

W C = E^[W C ]= [ ^KjKadHaaix). (32) 

J X 

In general neither W c nor its empirical approximation W c are invertible, so to define a controllability 
energy similar to (H} one is tempted to define L c on % as L c (h) = (W^h, h), where denotes the 
pseudoinverse of the operator A. However, the domain of W\ is equal to the range of W c , and so in 
general K x may not be in the domain of W\. We will therefore introduce the orthogonal projection 
W\W C mapping % n- range(VT c ) and define the nonlinear control energy on % as 



L c (h) = (w\{WlW c )h,h) 



(33) 



We will consider finite sample approximations to (f3"3"|) . however a further complication is that WJW C 
may not converge to W\W C in the limit of infinite data (taking the pseudoinverse is not a continuous 
operation), and W\ can easily be ill-conditioned in any event. Thus one needs to impose regularization, 
and we replace the pseudoinverse with a regularized inverse (A + A/) -1 , A > throughout. We note 
that the preceding observations were also made in [TT] . Intuitively, regularization prevents the estimator 
from overfitting to a bad or unrepresentative sample of data. We thus define the estimator L c : X —> K + 
(that is. on the domain {K x \ x G X} C H) to be 

L c (x) = ±{(W c + \I)- 2 W c K x ,K x ), xeX (34) 

with infinite-data limit 

L x c {x) = \({W c + \I)- 2 W c K Xl K x ), (35) 

where A > is the regularization parameter. 

Towards deriving an equivalent but computable expression for L c defined in terms of kernels, we recall 
the sampling operator S x of [29] and its adjoint. Let x = {xi}™ =1 denote a generic sample of m data 
points. To x we can associate the operators 

S X :H ^R m , hen ^(h(xi),...,h{x m )) 
S* : R m -> H, c g E™ i CiK*t ■ 

If x is the collection of m = Nq controllability samples, one can check that W c = -^S^S X and K c = S X S^. 
Consequently, 

Lc{x) = 1 2((±S* X S X + \I)- 2 ±S*S*K X ,K X ) 

= -k (s:(±s x s* x + xi)- 2 s x k x ,k x ) 

where k c (a;) := S X K X — {K(x, x^)) ^ is the A^q-dimensional column vector containing the kernel 
products between x and the controllability samples. 

Similarly, letting x now denote the collection of m = Np observability samples, we can approximate 
the future output energy by 

L {x) = \{W K x ,K x ) (36) 
= k(SxS x K x ,K x ) 
= ^ (x) T k (x) = ^\\k (x)\\ 2 2 

where k Q (x) := (K(x, d^)) 1 ^^ is the TVp-dimensional column vector containing the kernel products 
between x and the observability samples. We collect the above results into the following definition: 

Definition 4.1. Given a nonlinear control system of the form (|24]) . we define the kernel controllability 
energy function and the kernel observability energy function as, respectively, 

L c (x) = ^-k c (x) T (^- q K c + A/)- 2 k c (x) (37) 

L (x) = ^ ||ko(x)||2 • ( 38 ) 

Note that the kernels used to define L c and L a need not be the same. 

4.2 Consistency 

We'll now turn to showing that the estimator L c is consistent, but note that we do not address the ap- 
proximation error between the energy function estimates and the true but unknown underlying functions. 
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Controlling the approximation error requires making specific assumptions about the nonlinear system, 
and we leave this question open. 

In the following we will make an important set of assumptions regarding the kernel K and the RKHS 
H it induces. 

Assumption 4.1. The reproducing kernel K defined on the compact statespace X C W 1 is locally 
Lipschitz, measurable and defines a completely regular RKHS. Furthermore the diagonal of K is uniformly 
bounded, 

k 2 = supK(x,x) < oo. (39) 

xEX 

Separable RKHSes are induced by continuous kernels on separable spaces X. Since X C K™ is 
separable and locally Lipschitz functions are also continuous, T-L will always be separable. Completely 
regular RKHSes are introduced in [TT| and the reader is referred to this reference for details. Briefly, 
complete regularity ensures recovery of level sets of any distribution, in the limit of infinite data. The 
Gaussian kernel does not define a completely regular RKHS, but the L\ exponential and Laplacian kernels 
do [IT]. 

We introduce some additional notation. Let W c ^ n denote the empirical RKHS gramian formed from 
a sample of size m observations, and let the corresponding control energy estimate in Definition 14.11 
involving W C:Tn and regularization parameter A be denoted by L* m . 

The following preliminary lemma provides finite sample error bounds for Hilbert-Schmidt covariance 
matrices on real, separable reproducing kernel Hilbert spaces. 

Lemma 4.1 ([25] Theorem 7; Props. 8, 9). 

(i) The operators W c , W c , m are Hilbert-Schmidt. 

(ii) Let 5 e (0, 1]. With probability at least 1 - 8, 

\\W c -W c , m \\ HS < 2 -^£log^l (40) 
° Jm o 



The following theorem establishes consistency of the estimator L\ , the proof of which follows the 
method of integral operators developed by |29[ [7] and subsequently adopted in the context of density 
estimation by ([H], Theorem 1). 

Theorem 4.2. 

(i) Fix A > 0. For each x € X , with probability at least 1 — 5, 

| lUl) - i;W |<W) W /.?. (41) 

1 1 AVm o 



(ii) If (K, X, fioo) is such that 

sap\\W*{W*W c )K x \\ n <oo, (42) 
xex 

then for all x 6 X, 

lim\L*(x)-L c (x)\=0. 

A— >0 

(iii) If the condition (|4^|) holds and the sequence {A m } m satisfies lim A m = with lim =2E — _E1 = o, 
then 

lim \L^ m (x) — L c (x)\ = 0, almost surely. 
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Proof. For (i), the sample error, we have 



2\L* m (x) - L*(x)\ < \\(W c , m + XI)- 2 W c , m 
< \\(W C + XI)- 2 [X 2 (W C , 

K 2 (X 2 +K i ) 



(W C + XI)- 2 W C \\ \\K X \\ 2 H 

- W c ) + W C {W C - W Cim )W c>m ](W Ctm + xi)- 2 \\ K 2 



< 



A 4 



\W C . 



w c 



\HS 



where ||-|| refers to the operator norm. The second inequality follows from spectral calculus and (|39|) . The 
third line follows making use of the estimates (W C)Wl + -^-0~ 2 || — ^ || (W" c + -^-0~ 2 || — ^~ 2 > \\W c \\ns < 
k 2 , || W c ,m || if s < k 2 (and the fact that A > so that the relevant quantities are invertible). Part (i) then 
follows applying Lemma l4TTl to the quantity || W c>m — W C \\ HS . For (ii), the approximation error, note that 
the compact self-adjoint operator W c can be expanded onto an orthonormal basis {oj, </>,-}. We then have 

2\L x c {x) - L c {x)\ = \([{W c + XT)- 2 W c -Wl{WlW c )]K x ,K x )\ 



< 



aE 

i:o-;>0 



(*i + A) 2 1 

2(7; + X 



(<r< + X) 2 a, 



■\(<Ih,K x )\ 



i:<Ji>0 
2 



The last quantity above can be seen to converge to as A —> since the sum converges for all x under 
the condition P^|) . Lastly for part (iii), we see that if m —> oo and A 2 — > slower than y/m then the 
sample error (i) goes to while (ii) also holds. For almost sure convergence in part (i), we additionally 
require that for any e € (0, oo), 



J2H\Li m (x)-L x c (x)\>e)<J2^ 



< oo. 



The choice X m = log ' m satisfies this requirement, as can be seen from the fact that for large enough 



M < 



E 



m>M 



%l log 



<E 



m>M 



< OO. 



□ 



We note that the condition (|42[) required in part (ii) of the theorem has also been discussed in the 
context of support estimation in forthcoming work from the authors of 



4.3 Observability and Controllability Ellipsoids 

Given the preceding, we can estimate the reachable and observable sets of a nonlinear control system as 
level sets of the RKHS energy functions L c , L a from Definition 14.11 

Definition 4.2. Given a nonlinear control system ©, its reachable set can be estimated as 

K T = {x e X | L c (x) < t} (43) 

and its observable set can be estimated as 

tr> = {x € X | L (x) > t'} (44) 

for suitable choices of the threshold parameters t, t' . 

If the energy function estimates above are replaced with the true energy functions, and r = r' = 
1/2, one obtains a finite sample approximation to the controllability and observability ellipsoids defined 
in Section 12.11 if the system is linear. In general, r may be chosen empirically based on the data, 
using for instance a cross-validation procedure. Note that in the linear setting, the ellipsoid of strongly 

observable states is more commonly characterized as {x | x T W~ 1 x < 1} = {Wo x \ ||x|| < 1}; hence the 
definition flUJ). 
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5 Estimation of Invariant Measures for Ergodic Nonlin- 
ear SDEs 



In this Section we consider ergodic nonlinear SDEs of the form (|10[) . where the invariant (or "stationary") 
measure is a key quantity providing a great deal of insight. In the context of control, the support of 
the stationary distribution corresponds to the reachable set of the nonlinear control system and may 
be estimated by (|4"5]l . Solving a Fokker-Planck equation of the form f| 1 1 [) is one way to determine 
the probability distribution describing the solution to an SDE. However, for nonlinear systems finding 
an explicit solution to the Fokker-Planck equation -or even its steady-state solution- is a challenging 
problem. The study of existence of steady-state solutions can be traced back to the 1960s [HI I5T] . 
however explicit formulas for steady-state solutions of the Fokker-Planck equation exist in only a few 
special cases (see [SJ [TU1 [HI El [3TJ [23] for example) . Such systems are often conservative or second order 
vector-fields. Hartmann [T8] among others has studied balanced truncation in the context of linear SDEs, 
where empirical estimation of gramians plays a key role. 

We propose here a data-based non-parametric estimate of the solution to the steady-state Fokker- 
Planck equation (ITS]) for a nonlinear SDE, by combining the relation (fTS"| with the control energy esti- 
mate (|55]1 . Following the general theme of this paper, we make use of the theory from the linear Gaussian 
setting described in Section [2.21 but in a suitable reproducing kernel Hilbert space. Other estimators have 
of course been proposed in the literature for approximating invariant measures and for density estimation 
from data more generally (see e.g. [H[T31[T21[IH1[II])) however to our knowledge we are not aware of any 
estimation techniques which combine RKHS theory and nonlinear dynamical control systems. An advan- 
tage of our approach over other non-parametric methods is that an invariant density is approximated by 
way of a regularized fitting process, giving the user an additional degree of freedom in the regularization 
parameter. 

Our setting adopts the perspective that the nonlinear stochastic system (|10l) behaves approximately 
linearly when mapped via $ into the RKHS "H, and as such may be modeled by an infinite dimensional 
linear system in %. Although this system is unknown, we know that it is linear and that we can estimate 
its gramians and control energies from observed data. Furthermore, we know that the invariant measure 
of the system in H, is zero-mean Gaussian with covariance given by the controllability gramian. Thus the 
original nonlinear system's invariant measure on X should be reasonably approximated by the pullback 
along $ of the Gaussian invariant measure associated with the linear infinite dimensional SDE in %. 

We summarize the setting in the following modeling Assumption: 

Assumption 5.1. Let T-L be a real, possibly infinite dimensional RKHS satisfying Assumption 14.11 

(i) Given a suitable choice of kernel K, if the Revalued stochastic process x(t) is a solution to the (er- 
godic) stochastically excited nonlinear system (|10p . the "H-valued stochastic process (<!> o x)(t) =: X(t) 
can be reasonably modeled as an Ornstein-Uhlenbeck process 

dX(t) = AX(t)dt + \fCdW{t), X(0) =0eH (45) 

where A is linear, negative and is the infinitesimal generator of a strongly continuous semigroup 
e tA , C is linear, continuous, positive and self-adjoint, and W(t) is the cylindrical Wiener process. 

(ii) The measure Poo is the invariant measure of the OU process (|45|) and Poo is the pushforward along 
$ of the unknown invariant measure /^oo on the statespace X we would like to approximate. 

(iii) The measure ^ioo is absolutely continuous with respect to Lebesgue measure, and so admits a density. 

We will proceed in deriving an estimate of the invariant density under these assumptions, but note that 
there are interesting systems for which the assumptions may not always hold in practice. For example, 
uncontrollable systems may not have a unique invariant measure. In these cases one must interpret the 
results discussed here as heuristic in nature. 
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It is known that a mild solution X(t) to the SDE (|4"5")) exists and is unique (JTUJ , Thin. 5.4. pg. 121). 
Furthermore, the controllability gramian associated to (|45j) 

poo 

W c h= / e tA Ce tA "hdt, heH (46) 
Jo 

is trace class ([SJ, Lemma 8.19), and the unique measure invariant with respect to the Markov 
semigroup associated to the OU process has characteristic function ([9], Theorem 8.20) 

P^Qi) = exp(-i(W c /i,ft)), heH. (47) 

We will use the notation P to refer to the Fourier transform of the measure P. The law of the solution 
X(t) to problem (|45j) given initial condition X(0) = is Gaussian with zero mean and covariance operator 
Q t = f*e sA Ce sA 'ds. Thus 

W c = lim E[I(t)®I(il] 

t— >oo 

= f {;h)hdP 00 {h) 
Jn 

(■,K X ) Kxdficoix) 

where the last integral follows pulling P^ back to X via $, establishing the equivalence between (|4l?)) 
and d22). 

Given that the measure P^ has Fourier transform (|47|) and by Assumption 15.11 is interpreted as the 
pushforward of ^oo (that is, for Borel sets B e 8(71), Poo{B) = ($*^oo)(-B) = p o(^ 1 {B)) formally), 
we have that JLoo(x) = exp(-i (W C K X ,K X )). 

The invariant measure /ioo is defined on a finite dimensional space, so together with part (iii) of 
Assumption 15.11 we may consider the corresponding (Radon-Nikodym) density 

Poo (x) cc exp(-i {Wl{WlW c )K x ,K x )) 

whenever the condition (|4^|) holds. If P2l does not hold or if we are considering a finite data sample, 
then we regularize to arrive at 

Poo {x) oc exp(-i ((W c + \I)~ 1 K X ,K X )) (48) 

as discussed in Section [2~2l (see Eq.[T5|) and Section l4~Tl This density may be estimated from data {xi}fL 1 
since the controllability energy may be estimated from data: at a new point x, we have 

Poo (x) = Z- 1 exp(-L e (x)) (49) 

where L c is the empirical approximation computed according to Definition 14.11 and the constant Z may 
be either computed analytically in some cases or simply estimated from the data sample to enforce 
summation to unity. We may also estimate, for example, level sets of (such as the support) by 
considering level sets of the regularized control energy function estimator, {x G X \ L Ctm (x) < r}. 

6 Conclusion 

To summarize our contributions, we have introduced estimators for the controllability/observability en- 
ergies and the reachable/observable sets of nonlinear control systems. We showed that the controllability 
energy estimator may be used to approximate the stationary solution of the Fokker-Planck equation 
governing nonlinear SDEs (and its support). 

The estimators we derived were based on applying linear methods for control and random dynamical 
systems to nonlinear control systems and SDEs, once mapped into an infinite-dimensional RKHS acting 
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as a "linearizing space". These results collectively argue that there is a reasonable passage from linear 
dynamical systems theory to a data-based nonlinear dynamical systems theory through reproducing kernel 
Hilbert spaces. 

We leave for future work the formulation of data-based estimators for Lyapunov exponents and the 
controllability /observability operators ty c , ^ associated to nonlinear systems. 
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